library(rvest)
library(tidyverse)
library(knitr)
library(plyr)
library(dplyr)
library(jsonlite)
library(lubridate)
library(RSelenium)
We see two methods to capture departures and arrivals data for airport on FlightRadar website, using an headless browser, and using XHR request.
For each airports page, FlightRadar website offer the possibility to see general informations, departures and arrivals flights information. For this tutorial we try to scrape the Bordeaux Mérignac Airport BOD departures data page and arrival flights page
As you could see if you go to departures pages, you have two interesting buttons, one at the top of the page, and one at the bottom of the page.
To display all data available (something like 24h of past and future departures/arrivals), we simulate multiples clic on this two buttons, and we stop this behavior only when this buttons disapear from the page.
Due to some defence created by webmaster to protect some data, you need to simulate an human behavior, if possible using a real browser.
To be short, Selenium is a multi-tools project focusing on task automation to test web aplication. It works with lots of Internet browsers, and lot of operating systems.
In short, Selenium Webdriver give to developper an API to interact/pilot an headless internet browser without opening it. So, you, developper, you could use this API with your favorite langage (Java, Python, R, etc.) to sent commands to browser in order to navigate, move your mouse, click on DOM element, sent keyboard output to input forms, inject javascript, capture image of the page, extract html, etc.
First, you need to install and load RSelenium package, the R bindings library for Selenium Webdriver API :
install.packages("devtools")
devtools::install_github("ropensci/RSelenium")
Depending of your existing configuration and OS you probably need to install some dependent software packages.
It’s possible to use directly Selenium with your browser, but we prefer to use directly a server version. Why ? Because using server version of Selenium, you have the possibility a) to sent command on local or remote server running Selenium b) which run a different browsers and/or OS, c) to distribute tests over multiple machines.
Selenium is a fast moving project, and some release are really buggy, so try to choose a stable version, and don’t deseperate.
Install Docker on your OS using docker documentation at the bottom of this document.
When it’s done, we pull and run one of Docker Selenium-Server image using terminal. For this tutorial we use Firefox !
In classic context (good internet connection), we pull images directly from Docker Hub server.
sudo docker pull selenium/standalone-firefox:3.14.0-arsenic
But, because the image is heavy in size (1GO for the two images used in this tutorial), we prefer to directly load the image given by USB key by your teachers. Open a terminal on the folder where located the images.
sudo docker load --input=r-alpine.tar
sudo docker load --input=rSelenium.tar
Create the Selenium container :
sudo docker run --shm-size=2g --name selenium -d -p 4445:4444 selenium/standalone-firefox:3.14.0-arsenic
Type sudo docker ps to see if server correctly run and listen to port 4445.
Connect and open the browser on the server.
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L)
remDr$open()
## [1] "Connecting to remote server"
## $acceptInsecureCerts
## [1] FALSE
##
## $browserName
## [1] "firefox"
##
## $browserVersion
## [1] "61.0.1"
##
## $`moz:accessibilityChecks`
## [1] FALSE
##
## $`moz:headless`
## [1] FALSE
##
## $`moz:processID`
## [1] 288
##
## $`moz:profile`
## [1] "/tmp/rust_mozprofile.xqLsGR2Zk891"
##
## $`moz:useNonSpecCompliantPointerOrigin`
## [1] FALSE
##
## $`moz:webdriverClick`
## [1] TRUE
##
## $pageLoadStrategy
## [1] "normal"
##
## $platformName
## [1] "linux"
##
## $platformVersion
## [1] "4.15.0-34-generic"
##
## $rotatable
## [1] FALSE
##
## $timeouts
## $timeouts$implicit
## [1] 0
##
## $timeouts$pageLoad
## [1] 300000
##
## $timeouts$script
## [1] 30000
##
##
## $webdriver.remote.sessionid
## [1] "2d803d97-f908-44c6-afee-e3ecb1d7268b"
##
## $id
## [1] "2d803d97-f908-44c6-afee-e3ecb1d7268b"
remDr$maxWindowSize()
Johnd Harrison, the creator and first commiter of RSelenium binding library for Selenium, create a big tutorial with lots of commands covered : https://rpubs.com/johndharrison/RSelenium-Basics
Some of them :
remDr$maxWindowSize() : maximize windows of the browser.remDr$navigate("https://www.google.fr") : navigate to urlremDr$screenshot(display = TRUE) : take a screenshoot of the webpage and display it in RStudio ViewerremDr$findElement(...) : Find and element in the html structure, using different method : xpath, css, etc.remDr$executeScript(...) : Execute a js script in the remote browserOpen Web Developer tools in your favorite browser on the arrivals webpage of BOD : https://www.flightradar24.com/data/airports/bod/arrivals
We investigate what happens in the html code when the load earlier or load later button . Why we do that ? To understand how we could automate things.
Because we want to automate clic on this two buttons, so we need to understand WHEN we need to stop clicking :) If we clic an infinite number of time, an error probably trigger when one of the two button disapear.
Select the Selector tools (sic) and click on the load earlier flights button.
If you clic the right thing, normaly you have highlighted some part of the html code which interest us :
Now, Iif you highlight and clic with the web tool selector on the load later flights button, you have something like that :
Things are not so very differences between this two buttons objects. It seems that only the timestamp, the data page number and the button text change …
Hightlight and clic one more time on the load earlier flights button. Clic another time to load a new page of data. You see that the html code change during the data load to desactivate clic on the button. Not so interesting. Now repeat the clic and stop only when the button disapear on your screen.
Great, a new css style attribute appear to indicate that now this button object is hidden : style="display: none;"
How could we re-use this important information during data harvesting to detect if the button is activated/desactivated ? The best solution was to use XPATH query !
Load the page in the selenium server
remDr$navigate("https://www.flightradar24.com/data/airports/bod/arrivals")
Sys.sleep(5) # time to load !
remDr$screenshot(file = "screenshoot.png")
Building XPATH correct expression could be difficult. A good way to test validity of your XPATH expressions was to use an interactive way, with web developper console.
Clic on console tab :
Type this in the console : $x("//button[@class='btn btn-table-action btn-flights-load']")
The result is an interactive array you could develop as a tree if you want.
Clic Clic Clic to make disapear one of the loading button, and now we trying to select only the available button. XPATH understand boolean operator (or,and, etc.) so we filter by @class and style :
$x("//button[@class='btn btn-table-action btn-flights-load' and contains(@style,'display: none;')]")
Great, this query return only the valid button. We use later this query to stop our loop of infernal button clic.
Now we try to build this query using RSelenium with findElement() function :
loadmorebutton <- remDr$findElements(using = 'xpath', "//button[@class='btn btn-table-action btn-flights-load' and not(contains(@style,'display: none;'))]")
Display the text of each element retrieved by function findElements() using the getElementText() function
unlist(lapply(loadmorebutton, function(x){x$getElementText()}))
## [1] "Load earlier flights" "Load later flights"
Now, how to simulate a clic on one of this button ?
An easy way was to call clickElement() function on the first loadmorebutton webelement :
tryCatch({
suppressMessages({
loadmorebutton[[1]]$clickElement()})},
error = function(e) {
loadmorebutton[[1]]$errorDetails()$message
})
This command return an error message (if not, you’re lucky !), not very explicit, so if you want more details, you could call the function errorDetails() like our trycatch block.
An element of the webpage overlapp our button, so browser say us that’s not possible to clic on this webelement. Use snapshot function to see the page :
remDr$screenshot(file = 'screenshoot_overlap.png' )
If we hide these elements using XPath and javascript injection, everything goes to normal. First we accept cookies.
hideCookie <- function (x){
cookiesButton <- x$findElement(using = 'xpath',"//div[@class='important-banner__close']")
cookiesButton$clickElement()
}
hideCookie(remDr)
remDr$screenshot(file = 'screenshoot_hide.png')
The navbar element create problem, so we hide it using javascript injection :
hideNavBar <- function (x) {
script <- "document.getElementById('navContainer').hidden = true;"
x$executeScript(script)
}
hideNavBar(remDr)
## list()
Now you can clickElement() without problem :)
tryCatch({
suppressMessages({
loadmorebutton[[1]]$clickElement()})},
error = function(e) {
remDr$errorDetails()$message
})
See changes before and after using remDr$screenshot(display = TRUE) command
https://www.w3schools.com/js/js_json_http.asp
Sometimes, a defence is also a point of vulnerability. Many site use an internal API to query and feed website.
We try to see if this is the case with flight radar :)
Open the dev tools in the browser, clic on Network tab, then XHR tab.
Lucky guy/girl, do you see it ? Each GET query call an aiport.json file on the server :
https://api.flightradar24.com/common/v1/airport.json?code=bod&plugin[]=&plugin-setting[schedule][mode]=&plugin-setting[schedule][timestamp]=1537297562&page=1&limit=100&token=
If we decompose the query, we have : - an airport code : bod - a timestamp : 1537297562 - a page number : 1 - a limit by page : 100
Copy paste this url in your browser to see how the result json is structured. Insteresting data is located into schedule result > response > airport > arrivals : - item : number of total items - page : actual page and number of page - timestamp : date of capture - data : a list of 100 flights corresponding to actual page
We download and convert json data to data.frame using the jsonlite wonderfull package :) Why wonderfull ? Because jsonlite had an option to flatten the structure of json which normally contain data.frame into data.fram into data.frame …
timestamp <- as.numeric(as.POSIXct(now()))
url <- paste("https://api.flightradar24.com/common/v1/airport.json?code=bod&plugin[]=&plugin-setting[schedule][mode]=&plugin-setting[schedule][timestamp]=",timestamp,"&page=1&limit=100&token=",sep="")
# https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html
json <- jsonlite::fromJSON(url,flatten = T)
pageOfData <- json$result$response$airport$pluginData$schedule$arrivals$data
filteredData <- pageOfData %>% select(flight.airline.code.icao, flight.airline.name, flight.airport.origin.name, flight.airport.origin.code.icao, flight.airport.origin.position.latitude, flight.airport.origin.position.longitude)
filteredData <- rename(filteredData, c(flight.airline.code.icao = "ICAO", flight.airline.name= "Name", flight.airport.origin.name = "Origin", flight.airport.origin.code.icao="Origin ICAO", flight.airport.origin.position.latitude = "Latitude",flight.airport.origin.position.longitude = "Longitude" ))
knitr::kable(filteredData, caption = "page 1 of arrival for BOD")
| ICAO | Name | Origin | Origin ICAO | Latitude | Longitude |
|---|---|---|---|---|---|
| VOE | Volotea | Dubrovnik Airport | LDDU | 42.56135 | 18.268240 |
| RYR | Ryanair | Milan Bergamo Il Caravaggio International Airport | LIME | 45.67388 | 9.704166 |
| AFR | Air France | Paris Charles de Gaulle Airport | LFPG | 49.01252 | 2.555752 |
| IBE | Iberia Regional | Madrid Barajas Airport | LEMD | 40.49355 | -3.566760 |
| AFR | Air France | Bastia Poretta Airport | LFKB | 42.55000 | 9.484722 |
| DLH | Lufthansa | Frankfurt Airport | EDDF | 50.02642 | 8.543125 |
| BMS | Blue Air | Bucharest Henri Coanda International Airport | LROP | 44.57216 | 26.102171 |
| VOE | Volotea | Palma de Mallorca Airport | LEPA | 39.55167 | 2.738808 |
| RAM | Royal Air Maroc | Marrakesh Menara Airport | GMMX | 31.60688 | -8.036300 |
| BAW | British Airways | London Gatwick Airport | EGKK | 51.14805 | -0.190270 |
| EZY | easyJet | Bristol Airport | EGGD | 51.38266 | -2.719080 |
| RYR | Ryanair | Rome Ciampino Airport | LIRA | 41.79936 | 12.594930 |
| EZY | EasyJet | Geneva International Airport | LSGG | 46.23806 | 6.108950 |
| HOP | HOP! | Ajaccio Napoleon Bonaparte Airport | LFKJ | 41.92388 | 8.802500 |
| EZY | EasyJet | London Gatwick Airport | EGKK | 51.14805 | -0.190270 |
| EZY | easyJet | Tel Aviv Ben Gurion International Airport | LLBG | 32.01138 | 34.886662 |
| AFR | Air France | Paris Charles de Gaulle Airport | LFPG | 49.01252 | 2.555752 |
| KLM | KLM | Amsterdam Schiphol Airport | EHAM | 52.30861 | 4.763889 |
| EZY | EasyJet | Catania Fontanarossa Airport | LICC | 37.46678 | 15.066400 |
| HOP | HOP! | Figari Sud-Corse Airport | LFKF | 41.50222 | 9.096667 |
| EZY | EasyJet | Marrakesh Menara Airport | GMMX | 31.60688 | -8.036300 |
| VOE | Volotea | Bastia Poretta Airport | LFKB | 42.55000 | 9.484722 |
| VOE | Volotea | Ajaccio Napoleon Bonaparte Airport | LFKJ | 41.92388 | 8.802500 |
| VOE | Volotea | Tenerife South Airport | GCTS | 28.04447 | -16.572399 |
| EZY | EasyJet | Basel Mulhouse-Freiburg EuroAirport | LFSB | 47.59890 | 7.528300 |
| IBE | Iberia | Madrid Barajas Airport | LEMD | 40.49355 | -3.566760 |
| VLG | Vueling | Barcelona El Prat Airport | LEBL | 41.29707 | 2.078463 |
| BAW | British Airways | London Gatwick Airport | EGKK | 51.14805 | -0.190270 |
| AFR | Air France | Paris Charles de Gaulle Airport | LFPG | 49.01252 | 2.555752 |
| AFR | Air France | Paris Orly Airport | LFPO | 48.72333 | 2.379444 |
| VOE | Volotea | Split Airport | LDSP | 43.53894 | 16.297960 |
| EZY | EasyJet | Lyon Saint Exupery Airport | LFLL | 45.71964 | 5.089108 |
| VOE | Volotea | Figari Sud-Corse Airport | LFKF | 41.50222 | 9.096667 |
| RYR | Ryanair | London Stansted Airport | EGSS | 51.88500 | 0.235000 |
| BTI | Air Baltic | Riga International Airport | EVRA | 56.92361 | 23.971109 |
| AFR | Air France | Paris Charles de Gaulle Airport | LFPG | 49.01252 | 2.555752 |
| TAR | Tunisair | Tunis Carthage International Airport | DTTA | 36.85103 | 10.227210 |
| EZY | EasyJet | Venice Marco Polo Airport | LIPZ | 45.50527 | 12.351940 |
| EZY | EasyJet | London Gatwick Airport | EGKK | 51.14805 | -0.190270 |
| TAR | Tunisair | Djerba Zarzis International Airport | DTTJ | 33.87503 | 10.775460 |
| KLM | KLM | Amsterdam Schiphol Airport | EHAM | 52.30861 | 4.763889 |
| TAP | TAP Portugal | Lisbon Humberto Delgado Airport | LPPT | 38.78131 | -9.135910 |
| AFR | Air France | Paris Orly Airport | LFPO | 48.72333 | 2.379444 |
| EZY | EasyJet | Faro Airport | LPFR | 37.01442 | -7.965910 |
| BEE | Flybe | Birmingham Airport | EGBB | 52.45385 | -1.748020 |
| THY | Turkish Airlines | Istanbul Ataturk International Airport | LTBA | 40.97692 | 28.814600 |
| HOP | HOP! | Nice Cote d’Azur Airport | LFMN | 43.66527 | 7.215000 |
| EZY | EasyJet | Milan Malpensa Airport | LIMC | 45.63060 | 8.728111 |
| AFR | Air France | Paris Charles de Gaulle Airport | LFPG | 49.01252 | 2.555752 |
| IBE | Iberia | Madrid Barajas Airport | LEMD | 40.49355 | -3.566760 |
| VOE | Volotea | Palma de Mallorca Airport | LEPA | 39.55167 | 2.738808 |
| EZY | EasyJet | Luxembourg Findel Airport | ELLX | 49.62333 | 6.204444 |
| VOE | Volotea | Ajaccio Napoleon Bonaparte Airport | LFKJ | 41.92388 | 8.802500 |
| EZY | EasyJet | Bristol Airport | EGGD | 51.38266 | -2.719080 |
| HOP | HOP! | Rome Leonardo da Vinci Fiumicino Airport | LIRF | 41.80447 | 12.250790 |
| EZY | EasyJet | Lisbon Humberto Delgado Airport | LPPT | 38.78131 | -9.135910 |
| VOE | Volotea | Alicante Airport | LEAL | 38.28216 | -0.558150 |
| RYR | Ryanair | Brussels South Charleroi Airport | EBCI | 50.46000 | 4.452778 |
| AFR | Air France | Paris Orly Airport | LFPO | 48.72333 | 2.379444 |
| EZY | EasyJet | Barcelona El Prat Airport | LEBL | 41.29707 | 2.078463 |
| VOE | Volotea | Venice Marco Polo Airport | LIPZ | 45.50527 | 12.351940 |
| EIN | Aer Lingus | Dublin Airport | EIDW | 53.42138 | -6.270000 |
| HOP | HOP! | Marseille Provence Airport | LFML | 43.43666 | 5.215000 |
| KLM | KLM | Amsterdam Schiphol Airport | EHAM | 52.30861 | 4.763889 |
| EZY | EasyJet | London Luton Airport | EGGW | 51.87472 | -0.368330 |
| EZY | EasyJet | Nice Cote d’Azur Airport | LFMN | 43.66527 | 7.215000 |
| HOP | HOP! | Lyon Saint Exupery Airport | LFLL | 45.71964 | 5.089108 |
| EZY | EasyJet | Geneva International Airport | LSGG | 46.23806 | 6.108950 |
| DAH | Air Algerie | Algiers Houari Boumediene Airport | DAAG | 36.69101 | 3.215408 |
| AFR | Air France | Paris Charles de Gaulle Airport | LFPG | 49.01252 | 2.555752 |
| HOP | HOP! | Lyon Saint Exupery Airport | LFLL | 45.71964 | 5.089108 |
| EZY | EasyJet | Berlin Schonefeld Airport | EDDB | 52.38000 | 13.522500 |
| IBE | Iberia | Madrid Barajas Airport | LEMD | 40.49355 | -3.566760 |
| EZY | easyJet | Brussels Airport | EBBR | 50.90138 | 4.484444 |
| RAM | Royal Air Maroc | Casablanca Mohammed V International Airport | GMMN | 33.36746 | -7.589960 |
| AFR | Air France | Paris Orly Airport | LFPO | 48.72333 | 2.379444 |
| VOE | Volotea | Strasbourg Airport | LFST | 48.54361 | 7.637222 |
| DLH | Lufthansa | Frankfurt Airport | EDDF | 50.02642 | 8.543125 |
| EZY | EasyJet | Geneva International Airport | LSGG | 46.23806 | 6.108950 |
| EZY | EasyJet | Lille Airport | LFQQ | 50.56333 | 3.086944 |
| HOP | HOP! | Lille Airport | LFQQ | 50.56333 | 3.086944 |
| VLG | Vueling | Barcelona El Prat Airport | LEBL | 41.29707 | 2.078463 |
| NAX | Norwegian | Oslo Gardermoen Airport | ENGM | 60.19391 | 11.100360 |
| SWR | Swiss | Zurich Airport | LSZH | 47.46472 | 8.549167 |
| VOE | Volotea | Pisa Galileo Galilei Airport | LIRP | 43.68391 | 10.392750 |
| EZY | EasyJet | Lyon Saint Exupery Airport | LFLL | 45.71964 | 5.089108 |
| HOP | HOP! | Marseille Provence Airport | LFML | 43.43666 | 5.215000 |
| EZY | EasyJet | Amsterdam Schiphol Airport | EHAM | 52.30861 | 4.763889 |
| AFR | Air France | Paris Charles de Gaulle Airport | LFPG | 49.01252 | 2.555752 |
| AFR | Air France | Paris Orly Airport | LFPO | 48.72333 | 2.379444 |
| HOP | HOP! | Lyon Saint Exupery Airport | LFLL | 45.71964 | 5.089108 |
| EZY | EasyJet | Lyon Saint Exupery Airport | LFLL | 45.71964 | 5.089108 |
| EZY | EasyJet | Basel Mulhouse-Freiburg EuroAirport | LFSB | 47.59890 | 7.528300 |
| EZY | EasyJet | London Gatwick Airport | EGKK | 51.14805 | -0.190270 |
| BEL | Brussels Airlines | Brussels Airport | EBBR | 50.90138 | 4.484444 |
| EZY | EasyJet | Geneva International Airport | LSGG | 46.23806 | 6.108950 |
| EZY | EasyJet | Nice Cote d’Azur Airport | LFMN | 43.66527 | 7.215000 |
| HOP | HOP! | Lyon Saint Exupery Airport | LFLL | 45.71964 | 5.089108 |
| AFR | Air France | Paris Orly Airport | LFPO | 48.72333 | 2.379444 |
| EZY | EasyJet | Lille Airport | LFQQ | 50.56333 | 3.086944 |
This is the ultimate and probably the most complex part of this big tutorial.
In real webscraping project, there are two possible use case : a one shoot harvest, or a daily/monthly/etc. harvest of data.
If you need to collect one year of data on a daily basis, you cannot use your personnel computer. You need to connect and run your from a distant server.
To be really really short on subject, Docker is a technology which encapsulate your software into an isolated (and if possible immutable) container on the top of your system. The concept is similar to virtual machine (VM), but more efficient.
Here we are, we use this Docker container technology to encapsulate a webscrapping script. After that you could save your and launch it on a webserver.
There are three big step to understand in container lifecycle:
We describe the composition of an image into a Dockerfile file using special Docker syntax. It’s like a recipe into cookbook. For example, you could find lot of recipes on this site : DockerHub.
Next, like a recipe in the real life, you need to concretize this recipe into some delicious cake. Image need to be built before usage.
Finally, you run the builted image.
DOCKER installation
On linux Ubuntu, you found documentation here. First step, install the key and repository.
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
Add key and repository :
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
sudo apt-get update
Install docker-ce :
sudo apt-get install docker-ce
PREPARE image
Copy the folder docker-images on the USB Key (ask teachers) into the scrap-flightradar folder of this tutorial.
Now, go to this folder using terminal command (cd pathofthefolder), and load the two images on your system.
sudo docker load --input=r-alpine.tar
sudo docker load --input=rSelenium.tar
BUILD image
Go to docker-scripts folder into the folder which contain this tutorial on your disk.
The building of this image take lot of times (ten minutes), this is due to the huge dplyr library. Run the docker build command in the folder which contain the Dockerfile description of the image.
docker build . --tag=rflightscraps
LAUNCH IMAGE
localbackup and run the container rflightscraps with correct path.mkdir localbackup
docker run --name rflightscraps -d -e UID=1000 -e GID=1000 --mount type=bind,source=$(pwd)/localbackup,destination=/usr/local/src/flight-scrap/docker-scripts/data rflightscraps --name rflightscraps
To see if your container is running and consult the logs of execution :
sudo docker ps
sudo docker logs rflightscraps
To consult the result of automatic harvesting, consult the docker-scripts/localbackup folder using ls unix command. You see a list of csv which correspond to harvest made every minute. If you want to change this, you need to modify the crontab file following the cron syntax, and rebuild/relaunch the image (it take less time, because you only modify one file, no need to recompile).
Create a named volume, independent from filesystem
docker volume create --name myDataVolume
docker volume ls
Mount the volume :
docker run --mount type=volume,source=myDataVolume,destination=/usr/local/src/flight-scrap/docker-scripts/data rflightscrap
Export data :
Using a alpine image, we mount the named volume (myDataVolume) to a /alpine_data folder inside the alpine container.
Then, we create a new folder inside the alpine container named /alpine_backup.
We then create an archive containing the contents of the /alpine_data folder and we store it inside the /alpine_backup folder (inside the container).
We also mount the /alpine_backup folder from the container to the docker host (your local machine) in a folder named /local_backup inside the current directory.
docker run --rm -v myDataVolume:/alpine_data -v $(pwd)/local_backup:/alpine_backup alpine:latest tar cvf /alpine_backup/scrap_data_"$(date '+%y-%m-%d')".tar /alpine_data
Thre are two way to install Docker for windows, a new way (https://docs.docker.com/docker-for-windows/install/) and an old way. For this tutorial we use the the old way due to better compatibility.
Install Docker Tools for windows using the DockerToolbox.exe file. You could find the official documenation is available here
After that you could launch Docker quickstart terminal directly after installation or using the icon in start menu.
Docker first download an iso, and after that test if your system is ready to run containers. If you see an error like this, you need to run another step.
Restart your computer, and try to activate an option in the BIOS (Del key during initialization of your computer) probably named “Vanderpool technology” or “VT-X technology” or “Virtualization technology”. Save and restart. Some pictore for UEFI Bios on HP, DELL, ASUS motherboard/systems.
Asus
Dell
HP
PREPARE image
Copy docker-scripts and docker-images folders into c:\Program Files\Docker Toolbox
After that, into Terminal of Docker Toolbox you see this folders.
Go to docker-images folder using cd command, and load the two images :
docker load --input=r-alpine.tar
docker load --input=rSelenium.tar
BUILD image
Go to docker-scripts folder into the folder which contain this tutorial on your disk.
The building of this image take lot of times (ten minutes), this is due to the huge dplyr library. Run the docker build command in the folder which contain the Dockerfile description of the image.
docker build . --tag=rflightscraps
LAUNCH IMAGE
We use a binded volume, this is the easiest way actually.
First, create a new folder named localbackup into your users folder on windows : C:\Users\yourname After that, change the path by yours in this command and run it.
docker run --name rflightscraps -d -e UID=1000 -e GID=1000 --mount type=bind,source=/c/Users/reyse/localbackup,destination=/usr/local/src/flight-scrap/docker-scripts/data rflightscraps --name rflightscraps
The end, close the session !
remDr$close()